Hongyu's tutorial about Python in earth science!

Hi This is Hongyu's tutorial about Python in earch science!

I know there are numerous tutorials out there, but I would like to make this more project based thus help us copy and paste a little bit easier.

Read a CSV file

CSV is a very frequent used data type in many geoscience reserach and it could be easily converted from Microsoft excel or Kingsoft WPS.

This tutorial is going to teach you how to read a CSV file and make a scatter plot out of it.

Start the Jupyter Notebook

First thing first, we would like to open Jupyter Notebook

Open a anaconda prompt, and then simly type (or just copy and paste then hit enter)

conda activate Data_Analysis

If things are working properly, you would see something like

(Data_Analysis) PS C:\Users\Jeff>

The most important sign is the (Data_Analysis) , this means your environment is running now!

Next, we start a new Jupyter Notebook!

type

jupyter notebook

You could see a browser poped up and have a jupyter notebook running now!


If you do not know how to setup your environment ?

Please refer to this tutorial to setup the anaconda environment!

Read the CSV file

After opened a new jupyter notebook, we want to load the CSV file into pandas.

Change the directory in jupyter notebook to the location you stored the CSV file

Start a new jupyter notebook (click the upper right corner witha button says new, then pick jupyter notebook)

type

import pandas as pd This simply says you want to use pandas

Sample_Data = pd.read_csv('Scatter_Plot_Example_Data.csv') This says you want to read a data file called

In [1]:
import pandas as pd
Sample_Data = pd.read_csv('Scatter_Plot_Example_Data.csv')

Display the CSV file in Pandas/Python

I know it could be confusing when you did not see any thing on your screen and you do not know whether the data is read or not

The easiest way to check is type Sample_Data (whatever the name of you used previously)

In [2]:
Sample_Data # just like this 
Out[2]:
X Y X_error Y_error Y_lower_error Y_Upper_error
0 1 2.659756 0.36 0.12 0.030 0.14
1 2 2.497640 0.32 0.11 0.065 0.16
2 3 1.726148 0.12 0.20 0.055 0.20
3 4 0.334088 0.22 0.08 0.035 0.05
4 5 0.797335 0.18 0.16 0.095 0.14
5 6 1.074516 0.28 0.17 0.085 0.18
6 7 2.049046 0.18 0.08 0.045 0.05
7 8 2.903671 0.36 0.13 0.075 0.15
8 9 2.026484 0.36 0.20 0.100 0.08
9 10 1.268427 0.36 0.08 0.075 0.07
10 11 0.048695 0.24 0.10 0.040 0.18
11 12 0.793206 0.20 0.07 0.095 0.12
12 13 1.430861 0.12 0.11 0.050 0.10
13 14 2.733610 0.12 0.12 0.055 0.09
14 15 2.336767 0.36 0.19 0.070 0.08
15 16 1.588241 0.12 0.08 0.100 0.12
16 17 0.859570 0.30 0.20 0.055 0.08
17 18 0.557683 0.32 0.19 0.075 0.05
18 19 1.469991 0.34 0.09 0.055 0.17
19 20 2.263746 0.28 0.10 0.040 0.06
20 21 2.424670 0.24 0.19 0.080 0.14
21 22 1.963132 0.14 0.07 0.025 0.20
22 23 0.214029 0.10 0.19 0.080 0.18
23 24 1.086613 0.12 0.08 0.065 0.15
24 25 1.185143 0.26 0.17 0.025 0.20
25 26 1.912888 0.24 0.09 0.070 0.14
26 27 2.180408 0.12 0.15 0.095 0.15
27 28 1.819332 0.10 0.11 0.075 0.14
28 29 0.526857 0.10 0.16 0.035 0.08
29 30 0.966365 0.10 0.09 0.065 0.11

As you could see, we have 30 data points and each of them have their error bars and they are all stored in Sample_Data !


Alternative CSV reading

if you do not want to change directories to in jupyter notebook everything you used it or you have multiple data at different locations on your computer or server, you could read the data directly from their actual path

For example, if the file Scatter_Plot_Example_Data.csv is at location "C:\Users\Jeff\Data\Scatter_Plot_Example_Data.csv"

You could type

import pandas as pd

Sample_Data = pd.read_csv('C:\Users\Jeff\Data\Scatter_Plot_Example_Data.csv')

This will load the Scatter_Plot_Example_Data.csv file as well

Scatter plot without error bars

If you want to make a scatter plot without error bars, pandas actually has a pretty neat function called plot.scatter

To use it and make the scatter plot, simply type

Sample_Data.plot.scatter(x="X",y="Y")

In [111]:
#!/usr/bin/env python

import pandas as pd
Sample_Data = pd.read_csv('Scatter_Plot_Example_Data.csv')

import matplotlib.pyplot as plot

Figure_Scatter_Plot = Sample_Data.plot.scatter(x='X',y='Y')

# Formatting the labels on the axis

Figure_Scatter_Plot.set_xlabel("Time")    # set the x axis title
Figure_Scatter_Plot.set_ylabel("Water temperature") # set the y axis title
Figure_Scatter_Plot.set_title("Scatter Plot of Water Temperature vs Time") # set the title of the figure
Out[111]:
Text(0.5, 1.0, 'Scatter Plot of Water Temperature vs Time')

Scatter plot with symmetric error bars

If you want to make a scatter plot with symmetric error bars, it could also be achieved by plot.scatter

To use it and make the scatter plot, use the yerr option. If you want to put x error bars, simply change it to xerr

Sample_Data.plot.scatter(x="X",y="Y",yerr='Y_error')

In [112]:
#!/usr/bin/env python
import pandas as pd
Sample_Data = pd.read_csv('Scatter_Plot_Example_Data.csv')

import matplotlib.pyplot as plot

Figure_Scatter_Plot = Sample_Data.plot.scatter(x='X',y='Y',yerr='Y_error')

# Formatting the labels on the axis

Figure_Scatter_Plot.set_xlabel("Time")    # set the x axis title
Figure_Scatter_Plot.set_ylabel("Water temperature") # set the y axis title
Figure_Scatter_Plot.set_title("Scatter Plot of Water Temperature vs Time") # set the title of the figure
Out[112]:
Text(0.5, 1.0, 'Scatter Plot of Water Temperature vs Time')

Scatter plot symmetric error bars with grid on

A simple scatter plot could be good for illustration, however making it into a more organized form ofen takes more time than actually making the plot! Here are some examples.

We need to turn on grid option

Figure_Scatter_Plot = Sample_Data.plot.scatter(x='X',y='Y',yerr='Y_error',grid='on')

In [120]:
#!/usr/bin/env python
import pandas as pd
Sample_Data = pd.read_csv('Scatter_Plot_Example_Data.csv')

import matplotlib.pyplot as plot

Figure_Scatter_Plot = Sample_Data.plot.scatter(x='X',y='Y',yerr='Y_error',grid ='on')

# Formatting the labels on the axis

Figure_Scatter_Plot.set_xlabel("Time")    # set the x axis title
Figure_Scatter_Plot.set_ylabel("Water temperature") # set the y axis title
Figure_Scatter_Plot.set_title("Scatter Plot of Water Temperature vs Time") # set the title of the figure
Out[120]:
Text(0.5, 1.0, 'Scatter Plot of Water Temperature vs Time')

Scatter plot with asymmetric error bars

If you want to make a scatter plot with asymmetric error bars, it could also be achieved by plot.scatter

To use it and make the scatter plot, use the yerr option. If you want to put x error bars, simply change it to xerr

We need to contruct a strucure for error bars when it is asymmetric like

[ [ Sample_Data['Y_lower_error'], Sample_Data['Y_Upper_error'] ] ]

Therefore, to make the scatter plot. Type the following:

Sample_Data.plot.scatter(x='X',y='Y',yerr=[[Sample_Data['Y_lower_error'], Sample_Data['Y_Upper_error']]])')

In [117]:
#!/usr/bin/env python
import pandas as pd
Sample_Data = pd.read_csv('Scatter_Plot_Example_Data.csv')

import matplotlib.pyplot as plot

Figure_Scatter_Plot = Sample_Data.plot.scatter(x='X',y='Y',yerr=[[Sample_Data['Y_lower_error'], Sample_Data['Y_Upper_error']]])

# Formatting the labels on the axis

Figure_Scatter_Plot.set_xlabel("Time")    # set the x axis title
Figure_Scatter_Plot.set_ylabel("Water temperature") # set the y axis title
Figure_Scatter_Plot.set_title("Scatter Plot of Water Temperature vs Time") # set the title of the figure
Out[117]:
Text(0.5, 1.0, 'Scatter Plot of Water Temperature vs Time')

Scatter plot asymmetric error bars with grid on

A simple scatter plot could be good for illustration, however making it into a more organized form ofen takes more time than actually making the plot! Here are some examples.

We need to turn on grid option

Figure_Scatter_Plot = ... Sample_Data.plot.scatter(x='X',y='Y',yerr=[[Sample_Data['Y_lower_error'],Sample_Data['Y_Upper_error']]],grid='on',title='Scatter Plot of Water Temperature vs Time')

In [119]:
#!/usr/bin/env python
import pandas as pd
Sample_Data = pd.read_csv('Scatter_Plot_Example_Data.csv')

import matplotlib.pyplot as plot

Figure_Scatter_Plot = Sample_Data.plot.scatter(x='X',y='Y',yerr=[[Sample_Data['Y_lower_error'], Sample_Data['Y_Upper_error']]],grid='on')

# Formatting the labels on the axis

Figure_Scatter_Plot.set_xlabel("Time")    # set the x axis title
Figure_Scatter_Plot.set_ylabel("Water temperature") # set the y axis title
Figure_Scatter_Plot.set_title("Scatter Plot of Water Temperature vs Time") # set the title of the figure
Out[119]:
Text(0.5, 1.0, 'Scatter Plot of Water Temperature vs Time')

Scatter plot using Plotly

When making a prettier scatter plot, there are actually a lot of choices out there in python.

For example, Here is an example using Plotly module to make the scatter plot

In [133]:
import plotly.express as plot_scatter_simple

Plotly_Figure = plot_scatter_simple.scatter(Sample_Data, x="X", y="Y",error_y="Y_error")

Plotly_Figure.update_traces(mode='markers', marker_line_width=2, marker_size=6,marker_color='blue')
Plotly_Figure.update_layout(title='Plotly Scatter Plot!',title_font_size=20)

Plotly_Figure.show()